Goto

Collaborating Authors

 unknown information flow


Adaptive Learning with Unknown Information Flows

Neural Information Processing Systems

An agent facing sequential decisions that are characterized by partial feedback needs to strike a balance between maximizing immediate payoffs based on available information, and acquiring new information that may be essential for maximizing future payoffs. This trade-off is captured by the multi-armed bandit (MAB) framework that has been studied and applied when at each time epoch payoff observations are collected on the actions that are selected at that epoch. In this paper we introduce a new, generalized MAB formulation in which additional information on each arm may appear arbitrarily throughout the decision horizon, and study the impact of such information flows on the achievable performance and the design of efficient decision-making policies. By obtaining matching lower and upper bounds, we characterize the (regret) complexity of this family of MAB problems as a function of the information flows. We introduce an adaptive exploration policy that, without any prior knowledge of the information arrival process, attains the best performance (in terms of regret rate) that is achievable when the information arrival process is a priori known. Our policy uses dynamically customized virtual time indexes to endogenously control the exploration rate based on the realized information arrival process.


Reviews: Adaptive Learning with Unknown Information Flows

Neural Information Processing Systems

Summary: The paper proposes a new multi-armed bandit (MAB) formulation, in which the agent may observe reward realizations of some of the arms arbitrarily before each round. The paper studies the impact of information flows on regret performance by deriving the regret lower bound with respect to information flows. Moreover, the paper proposes an adaptive exploration policy matching the regret lower bound. However, the insight under this setting is somewhat unclear to the reviewer. Therefore, the reviewer suggests voting for "weak accept".


Adaptive Learning with Unknown Information Flows

Gur, Yonatan, Momeni, Ahmadreza

Neural Information Processing Systems

An agent facing sequential decisions that are characterized by partial feedback needs to strike a balance between maximizing immediate payoffs based on available information, and acquiring new information that may be essential for maximizing future payoffs. This trade-off is captured by the multi-armed bandit (MAB) framework that has been studied and applied when at each time epoch payoff observations are collected on the actions that are selected at that epoch. In this paper we introduce a new, generalized MAB formulation in which additional information on each arm may appear arbitrarily throughout the decision horizon, and study the impact of such information flows on the achievable performance and the design of efficient decision-making policies. By obtaining matching lower and upper bounds, we characterize the (regret) complexity of this family of MAB problems as a function of the information flows. We introduce an adaptive exploration policy that, without any prior knowledge of the information arrival process, attains the best performance (in terms of regret rate) that is achievable when the information arrival process is a priori known.